hard-attention transformer
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Núñez-Molina, Carlos, Gómez, Vicenç, Geffner, Hector
We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages
Angluin, Dana, Chiang, David, Yang, Andy
We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)